-
Notifications
You must be signed in to change notification settings - Fork 6.8k
feat: integrate S3 for dataset with compatibility #5941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
There is too much information in the pull request to test. |
Preview mcp_server Image: |
Preview sandbox Image: |
7b834ca to
bb0d038
Compare
Preview fastgpt Image: |
4d4d1b4 to
00b5f2e
Compare
Docs Preview:🚀 FastGPT Document Preview Ready! |
22b70f6 to
814075d
Compare
c9e2ae4 to
2a60b78
Compare
projects/app/src/pages/api/core/dataset/presignDatasetFileGetUrl.ts
Outdated
Show resolved
Hide resolved
8e8d492 to
31bfab7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
projects/app/src/pages/api/core/dataset/presignDatasetFileGetUrl.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements S3-backed storage for dataset files and images with presigned upload/download URLs, JWT-based proxy access, and maintains backward compatibility with GridFS. The implementation introduces a comprehensive S3 integration layer while preserving existing functionality.
Key Changes
- Adds S3 storage integration with
S3DatasetSourceclass providing upload, download, delete, and metadata operations for dataset files - Implements JWT-signed proxy endpoint for secure S3 object streaming and presigned URL generation APIs
- Updates dataset processing pipeline to handle S3 keys alongside GridFS IDs, including TTL management and parsed image cleanup
Reviewed Changes
Copilot reviewed 64 out of 66 changed files in this pull request and generated 40 comments.
Show a summary per file
| File | Description |
|---|---|
projects/app/src/pages/api/system/file/[jwt].ts |
New JWT-authenticated proxy endpoint for streaming S3 objects |
projects/app/src/pages/api/core/dataset/presignDatasetFilePostUrl.ts |
API for generating presigned upload URLs with authentication |
projects/app/src/pages/api/core/dataset/presignDatasetFileGetUrl.ts |
API for generating presigned download URLs supporting both collections and direct keys |
packages/service/common/s3/sources/dataset/index.ts |
Core S3DatasetSource class implementing dataset file operations |
packages/service/common/s3/utils.ts |
JWT signing/verification utilities and S3 TTL management helpers |
packages/service/core/dataset/read.ts |
Updated file reading logic to support S3 keys and parsed image uploads |
projects/app/src/pageComponents/dataset/detail/Import/diffSource/FileLocal.tsx |
Frontend upload flow migrated to presigned POST |
projects/app/src/components/Markdown/img/Image.tsx |
Markdown image component with S3 presigned URL resolution |
packages/service/core/dataset/collection/schema.ts |
Schema change: fileId now String type supporting both GridFS IDs and S3 keys |
packages/service/core/dataset/collection/controller.ts |
Collection deletion updated to clean up S3 files and parsed images |
packages/web/i18n/*/chat.json |
Added translation for image collection unsupported error |
packages/web/i18n/*/app.json |
Updated file upload tip to reflect S3 storage behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
projects/app/src/pages/api/core/dataset/collection/create/images.ts
Outdated
Show resolved
Hide resolved
projects/app/src/pages/api/core/dataset/presignDatasetFileGetUrl.ts
Outdated
Show resolved
Hide resolved
* fix: text split * remove test * feat: integrate S3 for dataset with compatibility * fix: delay s3 files delete timing * fix: remove imageKeys * fix: remove parsed images' TTL * fix: improve codes by pr comments --------- Co-authored-by: archer <545436317@qq.com>
Note
Adds S3-backed storage for dataset files/images with presigned upload/download, JWT proxy, and end-to-end UI/backend changes while keeping GridFS compatibility.
S3DatasetSource(upload by buffer, get/put/stat, delete by key/prefix, list) and TTL utils; supports dataset/chat/avatar buckets.GET /api/system/file/[jwt]to stream S3 objects.dataset/...vs GridFS IDs across delete/read paths.POST /core/dataset/presignDatasetFilePostUrl,POST /core/dataset/presignDatasetFileGetUrl(zod schemas inglobal/core/dataset/v2/api.ts).datasetIdand S3 keys.fileIdin collections is now string (S3 key or GridFS id); collectionfilemetadata simplified to{ filename?, contentLength? }.dataset/...andchat/...via presigned URLs.Written by Cursor Bugbot for commit 31bfab7. This will update automatically on new commits. Configure here.